Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FunC v0.5.0: syntax, refactoring, bugfixes, and a testing framework #1026

Closed
wants to merge 30 commits into from

Conversation

tolk-vm
Copy link
Contributor

@tolk-vm tolk-vm commented Jun 14, 2024

UPD. We decided not to develop FunC, but to fork it and start developing a new language, named TOLK.
See #1345


FunC v0.5.0 brings a lot of syntax enhancements, as well as internal refactoring encounpled with a brand new testing framework.

A brief changelog (all-in-one)

  1. Traditional comment syntax // and /* is now supported (and preferred)
  2. All functions are impure by default. Keyword impure has become deprecated, but its antonym keyword pure is introduced
  3. Keyword method_id is deprecated, also. It was replaced as too obscure. Now there is get, written on the left: get int seqno() { ... }
  4. Pragmas compute-asm-ltr and allow-post-modification are deprecated (always on)
  5. FunC compiler auto-inlines simple function wrappers, it's a kernel for potential camelCase and stdlib renamings
  6. FunC compiler can drop unused functions from Fift output, activated by #pragma remove-unused-functions
  7. Changed priorities of operators & | ^ to more intuitive ones
  8. Require parenthesis on probably wrong bitwise operators usage
  9. Built-in functions are also placed into stdlib
  10. Tremendously enhanced internal framework for testing FunC, a basis for future more drastic language improvements
  11. Some bug fixes, for wasm/Tact in particular
  12. IDE plugin for JetBrains has been updated, it supports new syntax and introduces a setting "FunC language level", encoupled with inspections to remove impure, replace method_id with get, etc.
  13. IDE plugin for VS Code has also been updated in the same manner: it supports new syntax, has the "FunC language level" setting and related diagnostics/quickfixes
  14. Related pull requests to documentation, func-js, and ton-validator (see below)

Now, let's describe each subject in detail.

Traditional comments // and /* */

Prior to v0.5.0, FunC had List-style comments ;; and {- -}.
Now, support for traditional comments // and /* */ has been added.

Both old and new comments still work, without any options or pragmas. So, you may just use traditional comments after upgrade, and old code will still be compiled. stdlib.fc has tranditional comments now.

Though ;; are still supported, // become a preferred choice.

Q: Why is this feature enabled by default, without any pragma?
A: Conceptually, every new syntax feature should be disabled by default and activated using a special compiler option. But now, we don't have an easy way to provide compiler options in func-js, blueprint, etc. Note, that introducing per-file #pragma is a wrong approach here, since if we want to fire human-readable error on using '//' without pragma, lexer should nevertheless work differently. (this could be controlled by a launch option, but see above)

Block comments can still be nested, though it's an undocumented feature, since nesting doesn't properly work in most JS highlighters, even in documentation and VS Code.

All real-world contracts I've found aren't broken, so this addition is pretty safe.

Impure by default, pure specifier

Prior to v0.5.0, there was a keyword impure (function specifier). When absent, a function was treated as pure. If its result is unused, its call was deleted by the compiler.

Though this behavior was documented, it was very unexpected to new-comers. For instance, various functions that don't return anything (threw an exception on mismatch, for example), were silently deleted. This situation was spoilt by the fact that FunC didn't check and validate function body, allowing impure operations inside pure functions.

Now,

  1. All functions are impure by default, using impure produces a warning
  2. Keyword pure is introduced
  3. All necessary functions in stdlib.fc are marked pure (so, any change in behavior will occur only to user-defined functions)
  4. If pure is used for a user-defined function, the compiler forbids impure operations in its body (exceptions, globals modification, calling non-pure functions, etc.)

Note, that in the future, we may teach FunC to auto-detect pureness. In other words, absence of pure doesn't mean "a function is impure": it just means "a function is default", which for now is impure in terms of current capabilities. Automatic checks are much better than to trust a user whether he put "impure" or not. Because mostly he did not, since it's a very unexpected pattern in a programming world.

Keyword get instead of method_id

Prior to v0.5.0, method_id was used for two purposes:

  1. Mark a function as a get method: int seqno() method_id { ... }
  2. Explicitly assign an internal id for the function: () after_upgrade() method_id(1666) { ... }

In the second case, after_upgrade() is unable to be called by its name (from a lite-client, ton scanner, blueprint, etc.), because a client will calculate another hash of a string "after_upgrade". That's why specifying id implicitly is useless for get methods.

To separate these two cases and because the keyword method_id is too obscure, a new syntax has been added:

get int seqno() { ... }

Keyword get is written on the left.

  1. get does not accept an id
  2. method_id should be used with an id: method_id(XXX), for tests and low-level code

Using method_id without id is still valid (to make old code compile), but produces a warning. Should be replaced with get.

Mixing get and forall makes no sense, but syntactically, get is expected before anything else.

Note, that since all functions are impure by default, get methods also are. They are allowed to call non-pure functions, for instance. It's not a problem when we remind that absence or pure does not mean "impure", it means "default". So, get methods are "default", actually.

Deprecate pragmas allow-post-modification and compute-asm-ltr

Now using them produces a warning: they are enabled always.

Note, that this change will most likely change Fift output (and a BoC hash) of your code. It's not bad and not good, it's just a fact. Emitting another "bytecode" after FunC/Fift update is expected, the end user does not rely on it. If your contract has already been deployed, its code won't be changed anyway, it's onchain. If your contract is under development, you are not interested in its hash. If you want to check a hash of an existing deployed contract, you should anyway take the exact version of FunC/Fift/stdlib used for deployment.

Also, I've refactored implementation of allow-post-modification to avoid producing disabled LET instructions in an intermediate representation.

Auto-inline, potential camelCase and stdlib renamings

Probably, one of the hardest commits in this MR.

Say, you just have the code

cell main() {
  return begin_cell().end_cell();
}

which is compiled into

PROGRAM{  
  DECLPROC main  
  main PROC:<{  
    NEWC
    ENDC
  }>  
}END>c

But you (whyever) don't like names "begin_cell" and "end_cell": say, you want to use camelCase.

Now, you can just write a function-wrapper:

builder beginCell() { return begin_cell(); }  
cell endCell(builder b) { return end_cell(b); }  
  
cell main() {  
  return beginCell().endCell();  
}

which is compiled exactly the same as original:

PROGRAM{  
  DECLPROC main  
  main PROC:<{  
    NEWC
    ENDC
  }>  
}END>c

Note, that beginCell() and endCell() don't even exist in fif output. Their usages were effectively replaced by asm commands, so they don't consume gas. Their code and declaration were not codegenerated, so they don't consume storage. The resulting BoC (and its hash) remains unchanged, so their usage is safe even in existing contracts.

A function is treated as a wrapper (and auto-inlined) when its body is just return anotherF(...), it passes all arguments to anotherF (probably, changing their order), return types match, it's not a getter, and so on.

Such an ability gives a straight way in the future to safely rename functions in stdlib (to put things in order): just add wrappers over existing stdlib functions, marking original ones as deprecated (to be done later). Example:

cell dict::new() { return new_dict(); }

forall X -> tuple tuple_push(tuple into, X value) { return tpush(into, value); }

If we want, we may introduce camelCase for stdlib:

tuple emptyTuple() { return empty_tuple(); }

forall X -> int isNull(X x) { return null?(x); }

Changing arguments order also works. Wrappers around built-in functions also work. Modifier methods ~ also work. From the IDE perspective, it just works out of the box.

Note, that you don't write pure when wrapping a pure function. That's what has been already said: absence of "pure" doesn't mean "impure", it just means "default".

Note, that FunC is smart enough in case of corner cases. For example, if a wrapper is used as a 1st class function:

builder beginCell() { return begin_cell(); }
...
var starter = beginCell;
starter();

Then beginCell() will be codegenerated, and starter will contain a reference to its continuation. Nevertheless, direct calls to beginCell() will still be inlined.

#pragma remove-unused-functions

Now, FunC can exclude unsed functions from Fift output. Disabled by default, enabled by this pragma.

A function is considered "used" if it has direct calls or indirect references from any other used function. Getters and special functions (main(), recv_internal(), etc.) are used apriori, they are the roots of dfs search.

Change priorities of operators & | ^

Prior to v0.5.0, such code if (slices_equal() & status == 1) was parsed as if( (slices_equal()&status) == 1 ). That was a reason of various errors in real-world contracts, just look at examples:

if (coincidences_count == 1 & (cc_2 == 0) & (cc_3 == 0))

if ((remaining > 0) & ctx_member_withdraw > 0)

if (0 == jetton_amount) | slippage > MAX_SLIPAGE

if (equal_slices?(sender_addr, mp_addr) & end? == true)

Surprisingly, sometimes old parsing produced the expected result (since true is -1, not 1), but when not, it was very hard to find out an error.

But anyway. As you can see, authors of this code expected & to have a lower priority, but it did not. Moreover, it was a bug: in almost all languages, & has lower priority.

So, we are fixing this bug. But since it's a breaking change, we introduce compilation errors in such cases instead of just silently compiling in a different (though often more expected) way.

Require parenthesis on probably wrong bitwise operators usage

Now, when you try to compile

if (flags & 0xFF != 0)

It leads to an error:

& has lower precedence than ==, probably this code won't work as you expected.  Use parenthesis: either (... & ...) to evaluate it first, or (... == ...) to suppress this error.

Hence, the code should be rewritten:

// either to evaluate it first (our case)
if ((flags & 0xFF) != 0)
// or to emphasize the behavior (not our case here)
if (flags & (0xFF != 0))

Here is an example, where the correct change will be the latter:

if (equal_slices?(addr1, addr2) & should_end == true)
// gives the same error, we need (parenthesis)
if (equal_slices?(addr1, addr2) & (should_end == true))

While it could be annoying in some cases,

// this produces an error
return f() & g() & c == 0;
// should be emphasized
return f() & g() & (c == 0);

but in general, this case is indistinguishable from the flags & 0xFF == 0 example.

Since FunC doesn't have a bool type, & | are used sometimes as logical ones, but sometimes as bitwise. That's the reason we decided occasions like above to be an error, not a warning. The only way to overcome such error is to use parenthesis, as you may see. In the future, we'll probably add bool to the language, as well as && || operators, which won't require parenthesis in obvious cases.

Keeping in mind that bool will probably be implemented in the future, symbol ! is now reserved, identifiers are disallowed to start with it.

Resuming a conversation, now using bitwise operators with any comparison operators requires parenthesis. Besides, mixing different operators arg1 & arg2 | arg3 is also erroneous:

if ((a == 0) & (b == 1) | (c == 2))

error: mixing & with | without parenthesis, probably this code won't work as you expected.  Use parenthesis to emphasize operator precedence.

I've also added a diagnostic for a common mistake in bitshift operators: a << 8 + 1 is equivalent to a << 9, probably unexpected.

int result = a << 8 + low_mask;

error: << has lower precedence than +, probably this code won't work as you expected.  Use parenthesis: either (... << ...) to evaluate it first, or (... + ...) to suppress this error.

Keyword builtin, its usage in stdlib

In FunC, some functions are implemented at compiler level, not as asm instructions (mostly it's used for optimization, to produce more effective code when constant parameters are passed to that functions). For example, load_uint(), null?() and throw() are built-ins.

Prior to v0.5.0, such functions either weren't mentioned in stdlib, or were commented out:

// int preload_int(slice s, int len) pure asm "PLDIX";

Now, all built-in functions are placed in stdlib, marked as builtin:

int preload_int(slice s, int len) pure builtin;

Note, that as earlier, they are available if you don't include stdlib.fc. The purpose of the builtin keyword is to make stdlib more intelligible.

Tremendously enhanced internal framework for testing FunC

Prior to v0.5.0, there was an auto-tests/tests folder with several .fc files, having a comment described provided input and expected output. For example:

{-    
    method_id | in            | out
TESTCASE | 0  | 1 1 1 -1 10 6 | 8 2
-}

There was a run_tests.py which traversed each file in a folder, detected such lines from comments, compiled to fif, and executed every testcase, comparing output.

It was okay, it worked, but... This framework was very-very poor. I am speaking not about the amount of tests, but what exactly we can test using such possibilities.

For example, consider functions-wrappers described above:

builder beginCell() { return begin_cell(); }  

...
beginCell()...

Even if we don't do anything inside the FunC compiler, all tests for input-output will work :) Because what we really want to test, it that

  1. beginCell() is not codegenerated (no DECLPROC and similar)
  2. usage of beginCell() is replaced with NEWC (not CALLDICT)

None of these could be explained in terms of input-output.

This is only one of the examples, there is much more we may want to test that goes beyond current testing framework.

I have fully rewritten an internal testing framework and added lots of capabilities to it. Let's look though.

@compilation_should_fail — checks that FunC compilation fails, and it's expected (this is called "negative tests").
@stderr — checks, when compilation fails, that stderr (compilation error) is expected.
Example:

_ main(s) {  
  var (z, t) = ;  
  
/*  
@compilation_should_fail  
@stderr identifier expected instead of `;`  
@stderr var (z, t) = ;  
*/

@fif_codegen — checks that contents of compiled.fif matches the expected pattern.
@fif_codegen_avoid — checks that it does not match the pattern.
The pattern is a multiline piece of fift code, optionally with "..." meaning "any lines here". It may contain //stack_comments, they will also be checked.
Example:

... some FunC code

/*
@fif_codegen
"""
test1 PROC:<{  
  //  
  NEWC        //  _5  
  ENDC        //  to_be_ref  
  NEWC        //  to_be_ref _8
  ...
  TRIPLE      //  _27
}>
"""

@fif_codegen_avoid DECLPROC beginCell
*/

@code_hash — checks that hash of compiled output.fif matches the provided value. It's used to "record" code boc hash and to check that it remains the same on compiler modifications. Being much less flexible than @fif_codegen, it nevertheless gives a guarantee of bytecode stability.
Example:

... some FunC code

/*
@code_hash 13830542019509784148027107880226447201604257839069192762244575629978154217223
*/

Of course, different tags can be mixed up in a single file: multiple TESTCASE, multiple @fif_codegen, etc.

Also, I've rewritten run_tests.js to be fully in sync with run_tests.py. It means, that now we can test fif codegen, compilation errors and so on for WASM also.

Besides, I've slightly worked on legacy_tests, extracting names/hashes to a separate file shared between py and js.

Chronologically, creating this testing framework was the first thing I've done. All the functionality above has been developed covered by necessary tests.

Moreover, I've downloaded sources of 300 verified contracts from verifier.ton.org and written a tool to launch FunC on a whole database after every commit. That makes me sure that current and future changes in the compiler don't break compilation of "flagship" codebase, and when BoC hashes (Fift output) are changed, I look through to ensure that changes are expected. That codebase lives outside of ton-blockchain repository.

IDE plugin improvements

Our plugin for JetBrains IDE and VS Code extension also needs to be improved to support such vast amount of changes.

The principal point is to introduce a setting "FunC language level" (the user selects it for a project, like "PHP language level" in PHPStorm). When it's "v0.4.x", everything should work as earlier, to continue developing contracts using an old compiler. When it's "v0.5.x", then:

  • // comments should be activated, as well as an inspection to transform ;; into //
  • impure deprecation, a quick fix to remove it
  • method_id deprecation, a quick fix to transform into get
  • new keywords pure, builtin
  • inspection for deprecated pragmas
  • updated operators priority

Note, that grammar should correspond to the latest version of FunC, and when some keywords (or other syntax) is unsupported due to user's setting, they should be highlighted as errors.

Migration guide: from v0.4.x to v0.5.0

This is a "manual" for users to migrate their code:

  1. Download a new version of the compiler. If you use blueprint or func-js directly, just update a package to the latest version.
  2. Download new stdlib.fc and replace your current (in the future, stdlib will be available out of the box, you won't need to download and store it in your project).
  3. Update your IDE plugin (available for JetBrains and VS Code).
  4. Choose "FunC language level" to "v0.5.x" in plugin settings.
  5. Prefer to use // traditional /* comments */ instead of old Lisp-style. IDE will suggest you to replace existing comments.
  6. Don't use impure, since all functions are impure by default. IDE will suggest you to remove existing specifiers.
  7. Don't use method_id, use get keyword on the left: get int seqno() { ... }. IDE will suggest you to replace existing specifiers.

Related pull requests

To documentation:

To func-js:

To highlightjs-func:

To TON verifier:

To VS Code extension:

To JetBrains IDE plugin: in progress

Fixes #971
Fixes #596
Fixes #1022

How to review this pull request?

Chronologically, by commits. Every commit contains a completed feature with green tests. No "fix" and other intermediate commits exist in a branch history.

tolk-vm added 26 commits June 14, 2024 15:22
It makes it easier to understand/debug
Also, drop some unused enum values from that cases
* fully refactor run_tests.py, make it extensible for the future
* an ability to write @compilation_should_fail tests
* an ability to launch run_tests.py for a single .fc file
* keep run_tests.js in sync with run_tests.py
* extract legacy_tests names/hashes to a separate file
  shared between legacy_tester.py and legacy_tester.js
Seeing function name in debugger
makes it much easier to delve into FunC sources
* @fif_codegen to match compiled.fif against an expected pattern
* @fif_codegen_avoid to ensure compiled.fif doesn't contain a substring
* both in Python and JS run_tests
* consider tests/codegen_check_demo.fc for examples
…(...args); }`

This will allow to easily implement camelCase wrappers aside stdlib,
even without changing hashes of existing contracts.
Also, stdlib renamings could be easily performed in the same manner,
even with arguments reordered.
@code_hash to match (boc) hash of compiled.fif against expected.
While being much less flexible than @fif_codegen, it nevertheless
gives a guarantee of bytecode stability on compiler modifications.
…anged

In auto-tests, @code_hash controls bytecode stability.
In legacy tests, expected hashes are specified in a separate file.
They work alongside Lisp-style ;; and {--}, without any #pragma.
Conceptually, a new syntax should be disabled by default
and activated using a special compiler option.
But now, we don't have an easy way to provide compiler options
in func-js, blueprint, etc.
Note, that introducing per-file #pragma is a wrong approach here,
since if we want to fire human-readable error on using '//' without pragma,
lexer should nevertheless work differently.
(this could be controlled by a launch option, but see above)
In stdlib, all existing pure functions are asm-implemented.
But since we introduced a `pure` keyword applicable to user-defined functions,
we need to check that they won't have any side effects
(exceptions, globals modification, etc.)
Note, that I have not added all builtin functions.
I filtered out strange and actually unused in practice,
like "int_at()" and similar, or "run_method0()" and similar.
(Probably, they should be dropped off even from builtins)

Also, I've modified some stdlib.fc legacy tests just to ensure
that a resulting hash doesn't change.
All tests pass: it does not affect hashes (since modifying
variables in a single expression was an error)
It changes all hashes, since the compiler needs to manipulate the stack
in a different way now.
…_Let

Before, #pragma allow-post-modification produced Op::_Let for every
tensor entity (which became non-disabled if modification really happened).
Although they are stripped off by the compiler and don't affect fif output,
they pollute intermediate "AST" representation (ops).
Now, Op::_Let is added only if var modification actually happens
(which is very uncommon for real-wise code)
Before, such code `if (slices_equal() & status == 1)` was parsed
as `if( (slices_equal()&status) == 1 )`.
Note, that this change leads to hash changes of some verified contracts,
but a new priority is more expected from the user experience.
`get` keyword behaves exactly like `method_id` (auto-calc hash),
but it's placed on the left, similar to Tact: `get T name()`.

`method_id(n)` is still valid, considering it can't be invoked by name,
since a client will compute another hash.
It's supposed it will be still used in tests and in low-level code
(not to be called externally, but to be called after replacing c3).

`get(hash)` is invalid, this keyword does not accept anything.
As it turned out, PSTRING() created a buffer of 128K.
If asm_code exceeded this buffer, it was truncated.
I've just dropped PSTRING() from there in favor of std::string.
Copy link

@Aho38wkw Aho38wkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#51817

Copy link

@Aho38wkw Aho38wkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardpen9
Copy link

Good to have!

@Dlibl
Copy link

Dlibl commented Aug 16, 2024

A first-level heading

Copy link

@Elgayar777 Elgayar777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [ ]

Copy link

@Elgayar777 Elgayar777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [ ]

@@ -296,7 +287,7 @@ void Op::show(std::ostream& os, const std::vector<TmpVar>& vars, std::string pfx
if (noreturn()) {
dis += "<noret> ";
}
if (!is_pure()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link

@Elgayar777 Elgayar777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [ ]

tolk-vm added a commit that referenced this pull request Nov 1, 2024
All changes from PR "FunC v0.5.0":
#1026

Instead of developing FunC, we decided to fork it.
BTW, the first Tolk release will be v0.6,
a metaphor of FunC v0.5 that missed a chance to occur.
tolk-vm added a commit that referenced this pull request Nov 1, 2024
All changes from PR "FunC v0.5.0":
#1026

Instead of developing FunC, we decided to fork it.
BTW, the first Tolk release will be v0.6,
a metaphor of FunC v0.5 that missed a chance to occur.
@tolk-vm
Copy link
Contributor Author

tolk-vm commented Nov 7, 2024

Closing this PR. We decided to leave FunC as is. Instead, we forked it and started developing a new language, named TOLK.
See this: #1345

@tolk-vm tolk-vm closed this Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FunC Related to FunC compiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.